tidyverseAugust 2019, UC Berkeley
Chris Paciorek (based on materials developed by Dana Seidel, Kellie Ottoboni, Rochelle Terman, Nima Hejazi, and Chris Krogslund)
It is often said that 80% of data analysis is spent on the process of cleaning and preparing the data. (Dasu and Johnson, 2003)
Thus before you can even get to doing any sort of sophisticated analysis or plotting, you'll generally first need to:
There are two competing schools of thought within the R community.
tidyverse uses syntax that's unlike base R and is superfluous.tidyverse tools because they are straightfoward to use, more readable than base R, and speed up the tidying process.We'll show you some of the tidyverse tools so you can make an informed decision about whether you want to use base R or these newfangled packages.
So far, you've seen the basics of manipulating data frames, e.g. subsetting, merging, and basic calculations. For instance, we can use base R functions to calculate summary statistics across groups of observations, e.g., the mean GDP per capita within each region:
mean(gap[gap$continent == "Africa", "gdpPercap"])## [1] 2193.755
mean(gap[gap$continent == "Americas", "gdpPercap"])## [1] 7136.11
mean(gap[gap$continent == "Asia", "gdpPercap"])## [1] 7902.15
But this isn't ideal because it involves a fair bit of repetition. Repeating yourself will cost you time, both now and later, and potentially introduce some nasty bugs.
dplyrLuckily, the dplyr package provides a number of very useful functions for manipulating data frames. These functions will save you time by reducing repetition. As an added bonus, you might even find the dplyr grammar easier to read.
Here we're going to cover 6 of the most commonly used functions as well as using pipes (%>%) to combine them.
select()filter()group_by()summarize()mutate()arrange()If you have have not installed this package earlier, please do so now:
# NOT run
install.packages('dplyr')Now let's load the package:
library(dplyr)dplyr::selectImagine that we just received the gapminder dataset, but are only interested in a few variables in it. We could use the select() function to keep only the columns corresponding to variables we select.
year_country_gdp_dplyr <- select(gap, year, country, gdpPercap)
head(year_country_gdp_dplyr)## year country gdpPercap
## 1 1952 Afghanistan 779.4453
## 2 1957 Afghanistan 820.8530
## 3 1962 Afghanistan 853.1007
## 4 1967 Afghanistan 836.1971
## 5 1972 Afghanistan 739.9811
## 6 1977 Afghanistan 786.1134
If we open up year_country_gdp, we'll see that it only contains the year, country and gdpPercap. This is equivalent to the base R subsetting function:
year_country_gdp_base <- gap[,c("year", "country", "gdpPercap")]
head(year_country_gdp_base)## year country gdpPercap
## 1 1952 Afghanistan 779.4453
## 2 1957 Afghanistan 820.8530
## 3 1962 Afghanistan 853.1007
## 4 1967 Afghanistan 836.1971
## 5 1972 Afghanistan 739.9811
## 6 1977 Afghanistan 786.1134
We can even check that these two data frames are equivalent:
# checking equivalence: TRUE indicates an exact match between these objects
all.equal(year_country_gdp_dplyr, year_country_gdp_base)## [1] TRUE
But, as we will see, dplyr makes for much more readable, efficient code because of its pipe operator.
dplyrAbove, we used what's called "normal" grammar, but the strengths of dplyr lie in combining several functions using pipes.
In typical base R code, a simple operation might be written like:
# NOT run
cupcakes <- bake(pour(mix(ingredients)))A computer has no trouble understanding this and your cupcakes will be made just fine but a person has to read right to left to understand the order of operations - the opposite of how most western languages are read - making it harder to understand what is being done!
To be more readable without pipes, we might break up this code into intermediate objects...
## NOT run
batter <- mix(ingredients)
muffin_tin <- pour(batter)
cupcakes <- bake(muffin_tin)but this can clutter our environment with a lot of variables that aren't very useful to us, and often are named very similar things (e.g. step, step1, step2...) which can lead to confusion and those hard-to-track-down bugs.
The pipe makes it easier to read code because it lays out the operations left to right so each line can be read like a line of a recipe for the perfect data frame!
Pipes take the input on the left side of the %>% symbol and pass it in as the first argument to the function on the right side.
With pipes, our cupcake example might be written like:
## NOT run
cupcakes <- ingredients %>%
mix() %>%
pour() %>%
bake()Pro Tip: In RStudio the hotkey for the pipe is Ctrl + Shift + M.
select & Pipe (%>%)Since the pipe grammar is unlike anything we've seen in R before, let's repeat what we did above with the gapminder dataset using pipes:
year_country_gdp <- gap %>% select(year, country, gdpPercap)First, we summon the gapminder data frame and pass it on to the next step using the pipe symbol %>%. The second step is the select() function. In this case we don't specify which data object we use in the call to select() since we've piped it in.
Fun Fact: There is a good chance you have encountered pipes before in the shell. In R, a pipe symbol is %>% while in the shell it is |. But the concept is the same!
dplyr::filterNow let's say we're only interested in African countries. We can combine select and filter to select only the observations where continent is Africa.
year_country_gdp_africa <- gap %>%
filter(continent == "Africa") %>%
select(year,country,gdpPercap)As with last time, first we pass the gapminder data frame to the filter() function, then we pass the filtered version of the gapminder data frame to the select() function.
To clarify, both the select and filter functions subsets the data frame. The difference is that select extracts certain columns, while filter extracts certain rows.
Note: The order of operations is very important in this case. If we used 'select' first, filter would not be able to find the variable continent since we would have removed it in the previous step.
dplyr Calculations Across GroupsA common task you'll encounter when working with data is running calculations on different groups within the data. For instance, what if we wanted to calculate the mean GDP per capita for each continent?
In base R, you would have to run the mean() function for each subset of data:
mean(gap$gdpPercap[gap$continent == "Africa"])## [1] 2193.755
mean(gap$gdpPercap[gap$continent == "Americas"])## [1] 7136.11
mean(gap$gdpPercap[gap$continent == "Asia"])## [1] 7902.15
mean(gap$gdpPercap[gap$continent == "Europe"])## [1] 14469.48
mean(gap$gdpPercap[gap$continent == "Oceania"])## [1] 18621.61
That's a lot of repetition! To make matters worse, what if we wanted to add these values to our original data frame as a new column? We would have to write something like this:
gap$mean.continent.GDP <- NA
gap$mean.continent.GDP[gap$continent == "Africa"] <- mean(gap$gdpPercap[gap$continent == "Africa"])
gap$mean.continent.GDP[gap$continent == "Americas"] <- mean(gap$gdpPercap[gap$continent == "Americas"])
gap$mean.continent.GDP[gap$continent == "Asia"] <- mean(gap$gdpPercap[gap$continent == "Asia"])
gap$mean.continent.GDP[gap$continent == "Europe"] <- mean(gap$gdpPercap[gap$continent == "Europe"])
gap$mean.continent.GDP[gap$continent == "Oceania"] <- mean(gap$gdpPercap[gap$continent == "Oceania"])You can see how this can get pretty tedious, especially if we want to calculate more complicated or refined statistics. We could use loops or apply functions, but these can be difficult, slow, or error-prone.
dplyr split-apply-combineThe abstract problem we're encountering here is know as "split-apply-combine":
We want to split our data into groups (in this case continents), apply some calculations on each group, then combine the results together afterwards.
Module 4 gave some ways to do split-apply-combine type stuff using the apply family of functions, but those are error prone and messy.
Luckily, dplyr offers a much cleaner, straight-forward solution to this problem.
# remove this column -- there are two easy ways!
gap <- gap %>% select(-mean.continent.GDP)
# OR
gap$mean.continent.GDP <- NULLdplyr::group_byWe've already seen how filter() can help us select observations that meet certain criteria (in the above: continent == "Europe"). More helpful, however, is the group_by() function, which will essentially use every unique criteria that we could have used in filter().
A grouped_df can be thought of as a list where each item in the list is a data.frame which contains only the rows that correspond to the a particular value continent (at least in the example above).
dplyr::summarizegroup_by() on its own is not particularly interesting. It's much more exciting used in conjunction with the summarize() function. This will allow use to create new variable(s) by applying transformations to variables in each of the continent-specific data frames. In other words, using the group_by() function, we split our original data frame into multiple pieces, which we then apply summary functions to (e.g. mean() or sd()) within summarize(). The output is a new data frame reduced in size, with one row per group.
gdp_bycontinents <- gap %>%
group_by(continent) %>%
summarize(mean_gdpPercap = mean(gdpPercap))## Warning: The `printer` argument is deprecated as of rlang 0.3.0.
## This warning is displayed once per session.
head(gdp_bycontinents)## # A tibble: 5 x 2
## continent mean_gdpPercap
## <chr> <dbl>
## 1 Africa 2194.
## 2 Americas 7136.
## 3 Asia 7902.
## 4 Europe 14469.
## 5 Oceania 18622.
That allowed us to calculate the mean gdpPercap for each continent. But it gets even better -- the function group_by() allows us to group by multiple variables. Let's group by year and continent.
gdp_bycontinents_byyear <- gap %>%
group_by(continent, year) %>%
summarize(mean_gdpPercap = mean(gdpPercap))
head(gdp_bycontinents_byyear)## # A tibble: 6 x 3
## # Groups: continent [1]
## continent year mean_gdpPercap
## <chr> <int> <dbl>
## 1 Africa 1952 1253.
## 2 Africa 1957 1385.
## 3 Africa 1962 1598.
## 4 Africa 1967 2050.
## 5 Africa 1972 2340.
## 6 Africa 1977 2586.
That is already quite powerful, but it gets even better! You're not limited to defining 1 new variable in summarize().
gdp_pop_bycontinents_byyear <- gap %>%
group_by(continent, year) %>%
summarize(mean_gdpPercap = mean(gdpPercap),
sd_gdpPercap = sd(gdpPercap),
mean_pop = mean(pop),
sd_pop = sd(pop))
head(gdp_pop_bycontinents_byyear)## # A tibble: 6 x 6
## # Groups: continent [1]
## continent year mean_gdpPercap sd_gdpPercap mean_pop sd_pop
## <chr> <int> <dbl> <dbl> <dbl> <dbl>
## 1 Africa 1952 1253. 983. 4570010. 6317450.
## 2 Africa 1957 1385. 1135. 5093033. 7076042.
## 3 Africa 1962 1598. 1462. 5702247. 7957545.
## 4 Africa 1967 2050. 2848. 6447875. 8985505.
## 5 Africa 1972 2340. 3287. 7305376. 10130833.
## 6 Africa 1977 2586. 4142. 8328097. 11585184.
dplyr::mutateWhat if we wanted to add these values to our original data frame instead of creating a new object? For this, we can use the mutate() function, which is similar to summarize() except it creates new variables in the same data frame that you pass into it.
gap_with_extra_vars <- gap %>%
group_by(continent, year) %>%
mutate(mean_gdpPercap = mean(gdpPercap),
sd_gdpPercap = sd(gdpPercap),
mean_pop = mean(pop),
sd_pop = sd(pop))
head(gap_with_extra_vars)## # A tibble: 6 x 10
## # Groups: continent, year [6]
## country year pop continent lifeExp gdpPercap mean_gdpPercap
## <chr> <int> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 Afghan… 1952 8.43e6 Asia 28.8 779. 5195.
## 2 Afghan… 1957 9.24e6 Asia 30.3 821. 5788.
## 3 Afghan… 1962 1.03e7 Asia 32.0 853. 5729.
## 4 Afghan… 1967 1.15e7 Asia 34.0 836. 5971.
## 5 Afghan… 1972 1.31e7 Asia 36.1 740. 8187.
## 6 Afghan… 1977 1.49e7 Asia 38.4 786. 7791.
## # … with 3 more variables: sd_gdpPercap <dbl>, mean_pop <dbl>,
## # sd_pop <dbl>
We can use also use mutate() to create new variables prior to (or even after) summarizing information. Note that mutate() does not need to operate on grouped data and it can do element-wise transformations.
gdp_pop_bycontinents_byyear <- gap %>%
mutate(gdp_billion = gdpPercap*pop/10^9) %>%
group_by(continent, year) %>%
summarize(mean_gdpPercap = mean(gdpPercap),
sd_gdpPercap = sd(gdpPercap),
mean_pop = mean(pop),
sd_pop = sd(pop),
mean_gdp_billion = mean(gdp_billion),
sd_gdp_billion = sd(gdp_billion))
head(gdp_pop_bycontinents_byyear)## # A tibble: 6 x 8
## # Groups: continent [1]
## continent year mean_gdpPercap sd_gdpPercap mean_pop sd_pop
## <chr> <int> <dbl> <dbl> <dbl> <dbl>
## 1 Africa 1952 1253. 983. 4570010. 6.32e6
## 2 Africa 1957 1385. 1135. 5093033. 7.08e6
## 3 Africa 1962 1598. 1462. 5702247. 7.96e6
## 4 Africa 1967 2050. 2848. 6447875. 8.99e6
## 5 Africa 1972 2340. 3287. 7305376. 1.01e7
## 6 Africa 1977 2586. 4142. 8328097. 1.16e7
## # … with 2 more variables: mean_gdp_billion <dbl>, sd_gdp_billion <dbl>
mutate vs. summarizeIt can be confusing to decide whether to use mutate or summarize. The key distinction is whether you want the output to have one row for each group or one row for each row in the original data frame:
mutate: creates new columns with as many rows as the original data framesummarize: creates a dataframe with as many rows as groupsNote that if you use an aggregation function such as mean() within mutate() without using groupby(), you'll simply do the summary over all the rows of the input dataframe.
And if you use an aggregation function such as mean() within summarize() without using groupby(), you'll simply create an output dataframe with one row (i.e., the whole input dataframe is a single group).
dplyr::arrangeAs a last step, let's say we want to sort the rows in our data frame according to values in a certain column. We can use the arrange() function to do this. For instance, let's organize our rows by year (recent first), and then by continent.
gap_with_extra_vars <- gap %>%
group_by(continent, year) %>%
mutate(mean_gdpPercap = mean(gdpPercap),
sd_gdpPercap = sd(gdpPercap),
mean_pop = mean(pop),
sd_pop = sd(pop)) %>%
arrange(desc(year), continent)
head(gap_with_extra_vars)## # A tibble: 6 x 10
## # Groups: continent, year [1]
## country year pop continent lifeExp gdpPercap mean_gdpPercap
## <chr> <int> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 Algeria 2007 3.33e7 Africa 72.3 6223. 3089.
## 2 Angola 2007 1.24e7 Africa 42.7 4797. 3089.
## 3 Benin 2007 8.08e6 Africa 56.7 1441. 3089.
## 4 Botswa… 2007 1.64e6 Africa 50.7 12570. 3089.
## 5 Burkin… 2007 1.43e7 Africa 52.3 1217. 3089.
## 6 Burundi 2007 8.39e6 Africa 49.6 430. 3089.
## # … with 3 more variables: sd_gdpPercap <dbl>, mean_pop <dbl>,
## # sd_pop <dbl>
dplyr Take-aways# without pipes:
gap_with_extra_vars <- arrange(
mutate(
group_by(gap, continent, year),
mean_gdpPercap = mean(gdpPercap)
),
desc(year), continent)You may run across the term "non-standard evaluation". The use of dataframe variables without quotes around them is an example of this.
Why is this strange?
gap %>% select(continent, year) %>% tail()Compare it to:
gap[ , c('continent', 'year')]## continent year
## 1 Asia 1952
## 2 Asia 1957
## 3 Asia 1962
## 4 Asia 1967
## 5 Asia 1972
## 6 Asia 1977
## 7 Asia 1982
## 8 Asia 1987
## 9 Asia 1992
## 10 Asia 1997
## 11 Asia 2002
## 12 Asia 2007
## 13 Europe 1952
## 14 Europe 1957
## 15 Europe 1962
## 16 Europe 1967
## 17 Europe 1972
## 18 Europe 1977
## 19 Europe 1982
## 20 Europe 1987
## 21 Europe 1992
## 22 Europe 1997
## 23 Europe 2002
## 24 Europe 2007
## 25 Africa 1952
## 26 Africa 1957
## 27 Africa 1962
## 28 Africa 1967
## 29 Africa 1972
## 30 Africa 1977
## 31 Africa 1982
## 32 Africa 1987
## 33 Africa 1992
## 34 Africa 1997
## 35 Africa 2002
## 36 Africa 2007
## 37 Africa 1952
## 38 Africa 1957
## 39 Africa 1962
## 40 Africa 1967
## 41 Africa 1972
## 42 Africa 1977
## 43 Africa 1982
## 44 Africa 1987
## 45 Africa 1992
## 46 Africa 1997
## 47 Africa 2002
## 48 Africa 2007
## 49 Americas 1952
## 50 Americas 1957
## 51 Americas 1962
## 52 Americas 1967
## 53 Americas 1972
## 54 Americas 1977
## 55 Americas 1982
## 56 Americas 1987
## 57 Americas 1992
## 58 Americas 1997
## 59 Americas 2002
## 60 Americas 2007
## 61 Oceania 1952
## 62 Oceania 1957
## 63 Oceania 1962
## 64 Oceania 1967
## 65 Oceania 1972
## 66 Oceania 1977
## 67 Oceania 1982
## 68 Oceania 1987
## 69 Oceania 1992
## 70 Oceania 1997
## 71 Oceania 2002
## 72 Oceania 2007
## 73 Europe 1952
## 74 Europe 1957
## 75 Europe 1962
## 76 Europe 1967
## 77 Europe 1972
## 78 Europe 1977
## 79 Europe 1982
## 80 Europe 1987
## 81 Europe 1992
## 82 Europe 1997
## 83 Europe 2002
## 84 Europe 2007
## 85 Asia 1952
## 86 Asia 1957
## 87 Asia 1962
## 88 Asia 1967
## 89 Asia 1972
## 90 Asia 1977
## 91 Asia 1982
## 92 Asia 1987
## 93 Asia 1992
## 94 Asia 1997
## 95 Asia 2002
## 96 Asia 2007
## 97 Asia 1952
## 98 Asia 1957
## 99 Asia 1962
## 100 Asia 1967
## 101 Asia 1972
## 102 Asia 1977
## 103 Asia 1982
## 104 Asia 1987
## 105 Asia 1992
## 106 Asia 1997
## 107 Asia 2002
## 108 Asia 2007
## 109 Europe 1952
## 110 Europe 1957
## 111 Europe 1962
## 112 Europe 1967
## 113 Europe 1972
## 114 Europe 1977
## 115 Europe 1982
## 116 Europe 1987
## 117 Europe 1992
## 118 Europe 1997
## 119 Europe 2002
## 120 Europe 2007
## 121 Africa 1952
## 122 Africa 1957
## 123 Africa 1962
## 124 Africa 1967
## 125 Africa 1972
## 126 Africa 1977
## 127 Africa 1982
## 128 Africa 1987
## 129 Africa 1992
## 130 Africa 1997
## 131 Africa 2002
## 132 Africa 2007
## 133 Americas 1952
## 134 Americas 1957
## 135 Americas 1962
## 136 Americas 1967
## 137 Americas 1972
## 138 Americas 1977
## 139 Americas 1982
## 140 Americas 1987
## 141 Americas 1992
## 142 Americas 1997
## 143 Americas 2002
## 144 Americas 2007
## 145 Europe 1952
## 146 Europe 1957
## 147 Europe 1962
## 148 Europe 1967
## 149 Europe 1972
## 150 Europe 1977
## 151 Europe 1982
## 152 Europe 1987
## 153 Europe 1992
## 154 Europe 1997
## 155 Europe 2002
## 156 Europe 2007
## 157 Africa 1952
## 158 Africa 1957
## 159 Africa 1962
## 160 Africa 1967
## 161 Africa 1972
## 162 Africa 1977
## 163 Africa 1982
## 164 Africa 1987
## 165 Africa 1992
## 166 Africa 1997
## 167 Africa 2002
## 168 Africa 2007
## 169 Americas 1952
## 170 Americas 1957
## 171 Americas 1962
## 172 Americas 1967
## 173 Americas 1972
## 174 Americas 1977
## 175 Americas 1982
## 176 Americas 1987
## 177 Americas 1992
## 178 Americas 1997
## 179 Americas 2002
## 180 Americas 2007
## 181 Europe 1952
## 182 Europe 1957
## 183 Europe 1962
## 184 Europe 1967
## 185 Europe 1972
## 186 Europe 1977
## 187 Europe 1982
## 188 Europe 1987
## 189 Europe 1992
## 190 Europe 1997
## 191 Europe 2002
## 192 Europe 2007
## 193 Africa 1952
## 194 Africa 1957
## 195 Africa 1962
## 196 Africa 1967
## 197 Africa 1972
## 198 Africa 1977
## 199 Africa 1982
## 200 Africa 1987
## 201 Africa 1992
## 202 Africa 1997
## 203 Africa 2002
## 204 Africa 2007
## 205 Africa 1952
## 206 Africa 1957
## 207 Africa 1962
## 208 Africa 1967
## 209 Africa 1972
## 210 Africa 1977
## 211 Africa 1982
## 212 Africa 1987
## 213 Africa 1992
## 214 Africa 1997
## 215 Africa 2002
## 216 Africa 2007
## 217 Asia 1952
## 218 Asia 1957
## 219 Asia 1962
## 220 Asia 1967
## 221 Asia 1972
## 222 Asia 1977
## 223 Asia 1982
## 224 Asia 1987
## 225 Asia 1992
## 226 Asia 1997
## 227 Asia 2002
## 228 Asia 2007
## 229 Africa 1952
## 230 Africa 1957
## 231 Africa 1962
## 232 Africa 1967
## 233 Africa 1972
## 234 Africa 1977
## 235 Africa 1982
## 236 Africa 1987
## 237 Africa 1992
## 238 Africa 1997
## 239 Africa 2002
## 240 Africa 2007
## 241 Americas 1952
## 242 Americas 1957
## 243 Americas 1962
## 244 Americas 1967
## 245 Americas 1972
## 246 Americas 1977
## 247 Americas 1982
## 248 Americas 1987
## 249 Americas 1992
## 250 Americas 1997
## 251 Americas 2002
## 252 Americas 2007
## 253 Africa 1952
## 254 Africa 1957
## 255 Africa 1962
## 256 Africa 1967
## 257 Africa 1972
## 258 Africa 1977
## 259 Africa 1982
## 260 Africa 1987
## 261 Africa 1992
## 262 Africa 1997
## 263 Africa 2002
## 264 Africa 2007
## 265 Africa 1952
## 266 Africa 1957
## 267 Africa 1962
## 268 Africa 1967
## 269 Africa 1972
## 270 Africa 1977
## 271 Africa 1982
## 272 Africa 1987
## 273 Africa 1992
## 274 Africa 1997
## 275 Africa 2002
## 276 Africa 2007
## 277 Americas 1952
## 278 Americas 1957
## 279 Americas 1962
## 280 Americas 1967
## 281 Americas 1972
## 282 Americas 1977
## 283 Americas 1982
## 284 Americas 1987
## 285 Americas 1992
## 286 Americas 1997
## 287 Americas 2002
## 288 Americas 2007
## 289 Asia 1952
## 290 Asia 1957
## 291 Asia 1962
## 292 Asia 1967
## 293 Asia 1972
## 294 Asia 1977
## 295 Asia 1982
## 296 Asia 1987
## 297 Asia 1992
## 298 Asia 1997
## 299 Asia 2002
## 300 Asia 2007
## 301 Americas 1952
## 302 Americas 1957
## 303 Americas 1962
## 304 Americas 1967
## 305 Americas 1972
## 306 Americas 1977
## 307 Americas 1982
## 308 Americas 1987
## 309 Americas 1992
## 310 Americas 1997
## 311 Americas 2002
## 312 Americas 2007
## 313 Africa 1952
## 314 Africa 1957
## 315 Africa 1962
## 316 Africa 1967
## 317 Africa 1972
## 318 Africa 1977
## 319 Africa 1982
## 320 Africa 1987
## 321 Africa 1992
## 322 Africa 1997
## 323 Africa 2002
## 324 Africa 2007
## 325 Africa 1952
## 326 Africa 1957
## 327 Africa 1962
## 328 Africa 1967
## 329 Africa 1972
## 330 Africa 1977
## 331 Africa 1982
## 332 Africa 1987
## 333 Africa 1992
## 334 Africa 1997
## 335 Africa 2002
## 336 Africa 2007
## 337 Africa 1952
## 338 Africa 1957
## 339 Africa 1962
## 340 Africa 1967
## 341 Africa 1972
## 342 Africa 1977
## 343 Africa 1982
## 344 Africa 1987
## 345 Africa 1992
## 346 Africa 1997
## 347 Africa 2002
## 348 Africa 2007
## 349 Americas 1952
## 350 Americas 1957
## 351 Americas 1962
## 352 Americas 1967
## 353 Americas 1972
## 354 Americas 1977
## 355 Americas 1982
## 356 Americas 1987
## 357 Americas 1992
## 358 Americas 1997
## 359 Americas 2002
## 360 Americas 2007
## 361 Africa 1952
## 362 Africa 1957
## 363 Africa 1962
## 364 Africa 1967
## 365 Africa 1972
## 366 Africa 1977
## 367 Africa 1982
## 368 Africa 1987
## 369 Africa 1992
## 370 Africa 1997
## 371 Africa 2002
## 372 Africa 2007
## 373 Europe 1952
## 374 Europe 1957
## 375 Europe 1962
## 376 Europe 1967
## 377 Europe 1972
## 378 Europe 1977
## 379 Europe 1982
## 380 Europe 1987
## 381 Europe 1992
## 382 Europe 1997
## 383 Europe 2002
## 384 Europe 2007
## 385 Americas 1952
## 386 Americas 1957
## 387 Americas 1962
## 388 Americas 1967
## 389 Americas 1972
## 390 Americas 1977
## 391 Americas 1982
## 392 Americas 1987
## 393 Americas 1992
## 394 Americas 1997
## 395 Americas 2002
## 396 Americas 2007
## 397 Europe 1952
## 398 Europe 1957
## 399 Europe 1962
## 400 Europe 1967
## 401 Europe 1972
## 402 Europe 1977
## 403 Europe 1982
## 404 Europe 1987
## 405 Europe 1992
## 406 Europe 1997
## 407 Europe 2002
## 408 Europe 2007
## 409 Europe 1952
## 410 Europe 1957
## 411 Europe 1962
## 412 Europe 1967
## 413 Europe 1972
## 414 Europe 1977
## 415 Europe 1982
## 416 Europe 1987
## 417 Europe 1992
## 418 Europe 1997
## 419 Europe 2002
## 420 Europe 2007
## 421 Africa 1952
## 422 Africa 1957
## 423 Africa 1962
## 424 Africa 1967
## 425 Africa 1972
## 426 Africa 1977
## 427 Africa 1982
## 428 Africa 1987
## 429 Africa 1992
## 430 Africa 1997
## 431 Africa 2002
## 432 Africa 2007
## 433 Americas 1952
## 434 Americas 1957
## 435 Americas 1962
## 436 Americas 1967
## 437 Americas 1972
## 438 Americas 1977
## 439 Americas 1982
## 440 Americas 1987
## 441 Americas 1992
## 442 Americas 1997
## 443 Americas 2002
## 444 Americas 2007
## 445 Americas 1952
## 446 Americas 1957
## 447 Americas 1962
## 448 Americas 1967
## 449 Americas 1972
## 450 Americas 1977
## 451 Americas 1982
## 452 Americas 1987
## 453 Americas 1992
## 454 Americas 1997
## 455 Americas 2002
## 456 Americas 2007
## 457 Africa 1952
## 458 Africa 1957
## 459 Africa 1962
## 460 Africa 1967
## 461 Africa 1972
## 462 Africa 1977
## 463 Africa 1982
## 464 Africa 1987
## 465 Africa 1992
## 466 Africa 1997
## 467 Africa 2002
## 468 Africa 2007
## 469 Americas 1952
## 470 Americas 1957
## 471 Americas 1962
## 472 Americas 1967
## 473 Americas 1972
## 474 Americas 1977
## 475 Americas 1982
## 476 Americas 1987
## 477 Americas 1992
## 478 Americas 1997
## 479 Americas 2002
## 480 Americas 2007
## 481 Africa 1952
## 482 Africa 1957
## 483 Africa 1962
## 484 Africa 1967
## 485 Africa 1972
## 486 Africa 1977
## 487 Africa 1982
## 488 Africa 1987
## 489 Africa 1992
## 490 Africa 1997
## 491 Africa 2002
## 492 Africa 2007
## 493 Africa 1952
## 494 Africa 1957
## 495 Africa 1962
## 496 Africa 1967
## 497 Africa 1972
## 498 Africa 1977
## 499 Africa 1982
## 500 Africa 1987
## 501 Africa 1992
## 502 Africa 1997
## 503 Africa 2002
## 504 Africa 2007
## 505 Africa 1952
## 506 Africa 1957
## 507 Africa 1962
## 508 Africa 1967
## 509 Africa 1972
## 510 Africa 1977
## 511 Africa 1982
## 512 Africa 1987
## 513 Africa 1992
## 514 Africa 1997
## 515 Africa 2002
## 516 Africa 2007
## 517 Europe 1952
## 518 Europe 1957
## 519 Europe 1962
## 520 Europe 1967
## 521 Europe 1972
## 522 Europe 1977
## 523 Europe 1982
## 524 Europe 1987
## 525 Europe 1992
## 526 Europe 1997
## 527 Europe 2002
## 528 Europe 2007
## 529 Europe 1952
## 530 Europe 1957
## 531 Europe 1962
## 532 Europe 1967
## 533 Europe 1972
## 534 Europe 1977
## 535 Europe 1982
## 536 Europe 1987
## 537 Europe 1992
## 538 Europe 1997
## 539 Europe 2002
## 540 Europe 2007
## 541 Africa 1952
## 542 Africa 1957
## 543 Africa 1962
## 544 Africa 1967
## 545 Africa 1972
## 546 Africa 1977
## 547 Africa 1982
## 548 Africa 1987
## 549 Africa 1992
## 550 Africa 1997
## 551 Africa 2002
## 552 Africa 2007
## 553 Africa 1952
## 554 Africa 1957
## 555 Africa 1962
## 556 Africa 1967
## 557 Africa 1972
## 558 Africa 1977
## 559 Africa 1982
## 560 Africa 1987
## 561 Africa 1992
## 562 Africa 1997
## 563 Africa 2002
## 564 Africa 2007
## 565 Europe 1952
## 566 Europe 1957
## 567 Europe 1962
## 568 Europe 1967
## 569 Europe 1972
## 570 Europe 1977
## 571 Europe 1982
## 572 Europe 1987
## 573 Europe 1992
## 574 Europe 1997
## 575 Europe 2002
## 576 Europe 2007
## 577 Africa 1952
## 578 Africa 1957
## 579 Africa 1962
## 580 Africa 1967
## 581 Africa 1972
## 582 Africa 1977
## 583 Africa 1982
## 584 Africa 1987
## 585 Africa 1992
## 586 Africa 1997
## 587 Africa 2002
## 588 Africa 2007
## 589 Europe 1952
## 590 Europe 1957
## 591 Europe 1962
## 592 Europe 1967
## 593 Europe 1972
## 594 Europe 1977
## 595 Europe 1982
## 596 Europe 1987
## 597 Europe 1992
## 598 Europe 1997
## 599 Europe 2002
## 600 Europe 2007
## 601 Americas 1952
## 602 Americas 1957
## 603 Americas 1962
## 604 Americas 1967
## 605 Americas 1972
## 606 Americas 1977
## 607 Americas 1982
## 608 Americas 1987
## 609 Americas 1992
## 610 Americas 1997
## 611 Americas 2002
## 612 Americas 2007
## 613 Africa 1952
## 614 Africa 1957
## 615 Africa 1962
## 616 Africa 1967
## 617 Africa 1972
## 618 Africa 1977
## 619 Africa 1982
## 620 Africa 1987
## 621 Africa 1992
## 622 Africa 1997
## 623 Africa 2002
## 624 Africa 2007
## 625 Africa 1952
## 626 Africa 1957
## 627 Africa 1962
## 628 Africa 1967
## 629 Africa 1972
## 630 Africa 1977
## 631 Africa 1982
## 632 Africa 1987
## 633 Africa 1992
## 634 Africa 1997
## 635 Africa 2002
## 636 Africa 2007
## 637 Americas 1952
## 638 Americas 1957
## 639 Americas 1962
## 640 Americas 1967
## 641 Americas 1972
## 642 Americas 1977
## 643 Americas 1982
## 644 Americas 1987
## 645 Americas 1992
## 646 Americas 1997
## 647 Americas 2002
## 648 Americas 2007
## 649 Americas 1952
## 650 Americas 1957
## 651 Americas 1962
## 652 Americas 1967
## 653 Americas 1972
## 654 Americas 1977
## 655 Americas 1982
## 656 Americas 1987
## 657 Americas 1992
## 658 Americas 1997
## 659 Americas 2002
## 660 Americas 2007
## 661 Asia 1952
## 662 Asia 1957
## 663 Asia 1962
## 664 Asia 1967
## 665 Asia 1972
## 666 Asia 1977
## 667 Asia 1982
## 668 Asia 1987
## 669 Asia 1992
## 670 Asia 1997
## 671 Asia 2002
## 672 Asia 2007
## 673 Europe 1952
## 674 Europe 1957
## 675 Europe 1962
## 676 Europe 1967
## 677 Europe 1972
## 678 Europe 1977
## 679 Europe 1982
## 680 Europe 1987
## 681 Europe 1992
## 682 Europe 1997
## 683 Europe 2002
## 684 Europe 2007
## 685 Europe 1952
## 686 Europe 1957
## 687 Europe 1962
## 688 Europe 1967
## 689 Europe 1972
## 690 Europe 1977
## 691 Europe 1982
## 692 Europe 1987
## 693 Europe 1992
## 694 Europe 1997
## 695 Europe 2002
## 696 Europe 2007
## 697 Asia 1952
## 698 Asia 1957
## 699 Asia 1962
## 700 Asia 1967
## 701 Asia 1972
## 702 Asia 1977
## 703 Asia 1982
## 704 Asia 1987
## 705 Asia 1992
## 706 Asia 1997
## 707 Asia 2002
## 708 Asia 2007
## 709 Asia 1952
## 710 Asia 1957
## 711 Asia 1962
## 712 Asia 1967
## 713 Asia 1972
## 714 Asia 1977
## 715 Asia 1982
## 716 Asia 1987
## 717 Asia 1992
## 718 Asia 1997
## 719 Asia 2002
## 720 Asia 2007
## 721 Asia 1952
## 722 Asia 1957
## 723 Asia 1962
## 724 Asia 1967
## 725 Asia 1972
## 726 Asia 1977
## 727 Asia 1982
## 728 Asia 1987
## 729 Asia 1992
## 730 Asia 1997
## 731 Asia 2002
## 732 Asia 2007
## 733 Asia 1952
## 734 Asia 1957
## 735 Asia 1962
## 736 Asia 1967
## 737 Asia 1972
## 738 Asia 1977
## 739 Asia 1982
## 740 Asia 1987
## 741 Asia 1992
## 742 Asia 1997
## 743 Asia 2002
## 744 Asia 2007
## 745 Europe 1952
## 746 Europe 1957
## 747 Europe 1962
## 748 Europe 1967
## 749 Europe 1972
## 750 Europe 1977
## 751 Europe 1982
## 752 Europe 1987
## 753 Europe 1992
## 754 Europe 1997
## 755 Europe 2002
## 756 Europe 2007
## 757 Asia 1952
## 758 Asia 1957
## 759 Asia 1962
## 760 Asia 1967
## 761 Asia 1972
## 762 Asia 1977
## 763 Asia 1982
## 764 Asia 1987
## 765 Asia 1992
## 766 Asia 1997
## 767 Asia 2002
## 768 Asia 2007
## 769 Europe 1952
## 770 Europe 1957
## 771 Europe 1962
## 772 Europe 1967
## 773 Europe 1972
## 774 Europe 1977
## 775 Europe 1982
## 776 Europe 1987
## 777 Europe 1992
## 778 Europe 1997
## 779 Europe 2002
## 780 Europe 2007
## 781 Americas 1952
## 782 Americas 1957
## 783 Americas 1962
## 784 Americas 1967
## 785 Americas 1972
## 786 Americas 1977
## 787 Americas 1982
## 788 Americas 1987
## 789 Americas 1992
## 790 Americas 1997
## 791 Americas 2002
## 792 Americas 2007
## 793 Asia 1952
## 794 Asia 1957
## 795 Asia 1962
## 796 Asia 1967
## 797 Asia 1972
## 798 Asia 1977
## 799 Asia 1982
## 800 Asia 1987
## 801 Asia 1992
## 802 Asia 1997
## 803 Asia 2002
## 804 Asia 2007
## 805 Asia 1952
## 806 Asia 1957
## 807 Asia 1962
## 808 Asia 1967
## 809 Asia 1972
## 810 Asia 1977
## 811 Asia 1982
## 812 Asia 1987
## 813 Asia 1992
## 814 Asia 1997
## 815 Asia 2002
## 816 Asia 2007
## 817 Africa 1952
## 818 Africa 1957
## 819 Africa 1962
## 820 Africa 1967
## 821 Africa 1972
## 822 Africa 1977
## 823 Africa 1982
## 824 Africa 1987
## 825 Africa 1992
## 826 Africa 1997
## 827 Africa 2002
## 828 Africa 2007
## 829 Asia 1952
## 830 Asia 1957
## 831 Asia 1962
## 832 Asia 1967
## 833 Asia 1972
## 834 Asia 1977
## 835 Asia 1982
## 836 Asia 1987
## 837 Asia 1992
## 838 Asia 1997
## 839 Asia 2002
## 840 Asia 2007
## 841 Asia 1952
## 842 Asia 1957
## 843 Asia 1962
## 844 Asia 1967
## 845 Asia 1972
## 846 Asia 1977
## 847 Asia 1982
## 848 Asia 1987
## 849 Asia 1992
## 850 Asia 1997
## 851 Asia 2002
## 852 Asia 2007
## 853 Asia 1952
## 854 Asia 1957
## 855 Asia 1962
## 856 Asia 1967
## 857 Asia 1972
## 858 Asia 1977
## 859 Asia 1982
## 860 Asia 1987
## 861 Asia 1992
## 862 Asia 1997
## 863 Asia 2002
## 864 Asia 2007
## 865 Asia 1952
## 866 Asia 1957
## 867 Asia 1962
## 868 Asia 1967
## 869 Asia 1972
## 870 Asia 1977
## 871 Asia 1982
## 872 Asia 1987
## 873 Asia 1992
## 874 Asia 1997
## 875 Asia 2002
## 876 Asia 2007
## 877 Africa 1952
## 878 Africa 1957
## 879 Africa 1962
## 880 Africa 1967
## 881 Africa 1972
## 882 Africa 1977
## 883 Africa 1982
## 884 Africa 1987
## 885 Africa 1992
## 886 Africa 1997
## 887 Africa 2002
## 888 Africa 2007
## 889 Africa 1952
## 890 Africa 1957
## 891 Africa 1962
## 892 Africa 1967
## 893 Africa 1972
## 894 Africa 1977
## 895 Africa 1982
## 896 Africa 1987
## 897 Africa 1992
## 898 Africa 1997
## 899 Africa 2002
## 900 Africa 2007
## 901 Africa 1952
## 902 Africa 1957
## 903 Africa 1962
## 904 Africa 1967
## 905 Africa 1972
## 906 Africa 1977
## 907 Africa 1982
## 908 Africa 1987
## 909 Africa 1992
## 910 Africa 1997
## 911 Africa 2002
## 912 Africa 2007
## 913 Africa 1952
## 914 Africa 1957
## 915 Africa 1962
## 916 Africa 1967
## 917 Africa 1972
## 918 Africa 1977
## 919 Africa 1982
## 920 Africa 1987
## 921 Africa 1992
## 922 Africa 1997
## 923 Africa 2002
## 924 Africa 2007
## 925 Africa 1952
## 926 Africa 1957
## 927 Africa 1962
## 928 Africa 1967
## 929 Africa 1972
## 930 Africa 1977
## 931 Africa 1982
## 932 Africa 1987
## 933 Africa 1992
## 934 Africa 1997
## 935 Africa 2002
## 936 Africa 2007
## 937 Asia 1952
## 938 Asia 1957
## 939 Asia 1962
## 940 Asia 1967
## 941 Asia 1972
## 942 Asia 1977
## 943 Asia 1982
## 944 Asia 1987
## 945 Asia 1992
## 946 Asia 1997
## 947 Asia 2002
## 948 Asia 2007
## 949 Africa 1952
## 950 Africa 1957
## 951 Africa 1962
## 952 Africa 1967
## 953 Africa 1972
## 954 Africa 1977
## 955 Africa 1982
## 956 Africa 1987
## 957 Africa 1992
## 958 Africa 1997
## 959 Africa 2002
## 960 Africa 2007
## 961 Africa 1952
## 962 Africa 1957
## 963 Africa 1962
## 964 Africa 1967
## 965 Africa 1972
## 966 Africa 1977
## 967 Africa 1982
## 968 Africa 1987
## 969 Africa 1992
## 970 Africa 1997
## 971 Africa 2002
## 972 Africa 2007
## 973 Africa 1952
## 974 Africa 1957
## 975 Africa 1962
## 976 Africa 1967
## 977 Africa 1972
## 978 Africa 1977
## 979 Africa 1982
## 980 Africa 1987
## 981 Africa 1992
## 982 Africa 1997
## 983 Africa 2002
## 984 Africa 2007
## 985 Americas 1952
## 986 Americas 1957
## 987 Americas 1962
## 988 Americas 1967
## 989 Americas 1972
## 990 Americas 1977
## 991 Americas 1982
## 992 Americas 1987
## 993 Americas 1992
## 994 Americas 1997
## 995 Americas 2002
## 996 Americas 2007
## 997 Asia 1952
## 998 Asia 1957
## 999 Asia 1962
## 1000 Asia 1967
## 1001 Asia 1972
## 1002 Asia 1977
## 1003 Asia 1982
## 1004 Asia 1987
## 1005 Asia 1992
## 1006 Asia 1997
## 1007 Asia 2002
## 1008 Asia 2007
## 1009 Europe 1952
## 1010 Europe 1957
## 1011 Europe 1962
## 1012 Europe 1967
## 1013 Europe 1972
## 1014 Europe 1977
## 1015 Europe 1982
## 1016 Europe 1987
## 1017 Europe 1992
## 1018 Europe 1997
## 1019 Europe 2002
## 1020 Europe 2007
## 1021 Africa 1952
## 1022 Africa 1957
## 1023 Africa 1962
## 1024 Africa 1967
## 1025 Africa 1972
## 1026 Africa 1977
## 1027 Africa 1982
## 1028 Africa 1987
## 1029 Africa 1992
## 1030 Africa 1997
## 1031 Africa 2002
## 1032 Africa 2007
## 1033 Africa 1952
## 1034 Africa 1957
## 1035 Africa 1962
## 1036 Africa 1967
## 1037 Africa 1972
## 1038 Africa 1977
## 1039 Africa 1982
## 1040 Africa 1987
## 1041 Africa 1992
## 1042 Africa 1997
## 1043 Africa 2002
## 1044 Africa 2007
## 1045 Asia 1952
## 1046 Asia 1957
## 1047 Asia 1962
## 1048 Asia 1967
## 1049 Asia 1972
## 1050 Asia 1977
## 1051 Asia 1982
## 1052 Asia 1987
## 1053 Asia 1992
## 1054 Asia 1997
## 1055 Asia 2002
## 1056 Asia 2007
## 1057 Africa 1952
## 1058 Africa 1957
## 1059 Africa 1962
## 1060 Africa 1967
## 1061 Africa 1972
## 1062 Africa 1977
## 1063 Africa 1982
## 1064 Africa 1987
## 1065 Africa 1992
## 1066 Africa 1997
## 1067 Africa 2002
## 1068 Africa 2007
## 1069 Asia 1952
## 1070 Asia 1957
## 1071 Asia 1962
## 1072 Asia 1967
## 1073 Asia 1972
## 1074 Asia 1977
## 1075 Asia 1982
## 1076 Asia 1987
## 1077 Asia 1992
## 1078 Asia 1997
## 1079 Asia 2002
## 1080 Asia 2007
## 1081 Europe 1952
## 1082 Europe 1957
## 1083 Europe 1962
## 1084 Europe 1967
## 1085 Europe 1972
## 1086 Europe 1977
## 1087 Europe 1982
## 1088 Europe 1987
## 1089 Europe 1992
## 1090 Europe 1997
## 1091 Europe 2002
## 1092 Europe 2007
## 1093 Oceania 1952
## 1094 Oceania 1957
## 1095 Oceania 1962
## 1096 Oceania 1967
## 1097 Oceania 1972
## 1098 Oceania 1977
## 1099 Oceania 1982
## 1100 Oceania 1987
## 1101 Oceania 1992
## 1102 Oceania 1997
## 1103 Oceania 2002
## 1104 Oceania 2007
## 1105 Americas 1952
## 1106 Americas 1957
## 1107 Americas 1962
## 1108 Americas 1967
## 1109 Americas 1972
## 1110 Americas 1977
## 1111 Americas 1982
## 1112 Americas 1987
## 1113 Americas 1992
## 1114 Americas 1997
## 1115 Americas 2002
## 1116 Americas 2007
## 1117 Africa 1952
## 1118 Africa 1957
## 1119 Africa 1962
## 1120 Africa 1967
## 1121 Africa 1972
## 1122 Africa 1977
## 1123 Africa 1982
## 1124 Africa 1987
## 1125 Africa 1992
## 1126 Africa 1997
## 1127 Africa 2002
## 1128 Africa 2007
## 1129 Africa 1952
## 1130 Africa 1957
## 1131 Africa 1962
## 1132 Africa 1967
## 1133 Africa 1972
## 1134 Africa 1977
## 1135 Africa 1982
## 1136 Africa 1987
## 1137 Africa 1992
## 1138 Africa 1997
## 1139 Africa 2002
## 1140 Africa 2007
## 1141 Europe 1952
## 1142 Europe 1957
## 1143 Europe 1962
## 1144 Europe 1967
## 1145 Europe 1972
## 1146 Europe 1977
## 1147 Europe 1982
## 1148 Europe 1987
## 1149 Europe 1992
## 1150 Europe 1997
## 1151 Europe 2002
## 1152 Europe 2007
## 1153 Asia 1952
## 1154 Asia 1957
## 1155 Asia 1962
## 1156 Asia 1967
## 1157 Asia 1972
## 1158 Asia 1977
## 1159 Asia 1982
## 1160 Asia 1987
## 1161 Asia 1992
## 1162 Asia 1997
## 1163 Asia 2002
## 1164 Asia 2007
## 1165 Asia 1952
## 1166 Asia 1957
## 1167 Asia 1962
## 1168 Asia 1967
## 1169 Asia 1972
## 1170 Asia 1977
## 1171 Asia 1982
## 1172 Asia 1987
## 1173 Asia 1992
## 1174 Asia 1997
## 1175 Asia 2002
## 1176 Asia 2007
## 1177 Americas 1952
## 1178 Americas 1957
## 1179 Americas 1962
## 1180 Americas 1967
## 1181 Americas 1972
## 1182 Americas 1977
## 1183 Americas 1982
## 1184 Americas 1987
## 1185 Americas 1992
## 1186 Americas 1997
## 1187 Americas 2002
## 1188 Americas 2007
## 1189 Americas 1952
## 1190 Americas 1957
## 1191 Americas 1962
## 1192 Americas 1967
## 1193 Americas 1972
## 1194 Americas 1977
## 1195 Americas 1982
## 1196 Americas 1987
## 1197 Americas 1992
## 1198 Americas 1997
## 1199 Americas 2002
## 1200 Americas 2007
## 1201 Americas 1952
## 1202 Americas 1957
## 1203 Americas 1962
## 1204 Americas 1967
## 1205 Americas 1972
## 1206 Americas 1977
## 1207 Americas 1982
## 1208 Americas 1987
## 1209 Americas 1992
## 1210 Americas 1997
## 1211 Americas 2002
## 1212 Americas 2007
## 1213 Asia 1952
## 1214 Asia 1957
## 1215 Asia 1962
## 1216 Asia 1967
## 1217 Asia 1972
## 1218 Asia 1977
## 1219 Asia 1982
## 1220 Asia 1987
## 1221 Asia 1992
## 1222 Asia 1997
## 1223 Asia 2002
## 1224 Asia 2007
## 1225 Europe 1952
## 1226 Europe 1957
## 1227 Europe 1962
## 1228 Europe 1967
## 1229 Europe 1972
## 1230 Europe 1977
## 1231 Europe 1982
## 1232 Europe 1987
## 1233 Europe 1992
## 1234 Europe 1997
## 1235 Europe 2002
## 1236 Europe 2007
## 1237 Europe 1952
## 1238 Europe 1957
## 1239 Europe 1962
## 1240 Europe 1967
## 1241 Europe 1972
## 1242 Europe 1977
## 1243 Europe 1982
## 1244 Europe 1987
## 1245 Europe 1992
## 1246 Europe 1997
## 1247 Europe 2002
## 1248 Europe 2007
## 1249 Americas 1952
## 1250 Americas 1957
## 1251 Americas 1962
## 1252 Americas 1967
## 1253 Americas 1972
## 1254 Americas 1977
## 1255 Americas 1982
## 1256 Americas 1987
## 1257 Americas 1992
## 1258 Americas 1997
## 1259 Americas 2002
## 1260 Americas 2007
## 1261 Africa 1952
## 1262 Africa 1957
## 1263 Africa 1962
## 1264 Africa 1967
## 1265 Africa 1972
## 1266 Africa 1977
## 1267 Africa 1982
## 1268 Africa 1987
## 1269 Africa 1992
## 1270 Africa 1997
## 1271 Africa 2002
## 1272 Africa 2007
## 1273 Europe 1952
## 1274 Europe 1957
## 1275 Europe 1962
## 1276 Europe 1967
## 1277 Europe 1972
## 1278 Europe 1977
## 1279 Europe 1982
## 1280 Europe 1987
## 1281 Europe 1992
## 1282 Europe 1997
## 1283 Europe 2002
## 1284 Europe 2007
## 1285 Africa 1952
## 1286 Africa 1957
## 1287 Africa 1962
## 1288 Africa 1967
## 1289 Africa 1972
## 1290 Africa 1977
## 1291 Africa 1982
## 1292 Africa 1987
## 1293 Africa 1992
## 1294 Africa 1997
## 1295 Africa 2002
## 1296 Africa 2007
## 1297 Africa 1952
## 1298 Africa 1957
## 1299 Africa 1962
## 1300 Africa 1967
## 1301 Africa 1972
## 1302 Africa 1977
## 1303 Africa 1982
## 1304 Africa 1987
## 1305 Africa 1992
## 1306 Africa 1997
## 1307 Africa 2002
## 1308 Africa 2007
## 1309 Asia 1952
## 1310 Asia 1957
## 1311 Asia 1962
## 1312 Asia 1967
## 1313 Asia 1972
## 1314 Asia 1977
## 1315 Asia 1982
## 1316 Asia 1987
## 1317 Asia 1992
## 1318 Asia 1997
## 1319 Asia 2002
## 1320 Asia 2007
## 1321 Africa 1952
## 1322 Africa 1957
## 1323 Africa 1962
## 1324 Africa 1967
## 1325 Africa 1972
## 1326 Africa 1977
## 1327 Africa 1982
## 1328 Africa 1987
## 1329 Africa 1992
## 1330 Africa 1997
## 1331 Africa 2002
## 1332 Africa 2007
## 1333 Europe 1952
## 1334 Europe 1957
## 1335 Europe 1962
## 1336 Europe 1967
## 1337 Europe 1972
## 1338 Europe 1977
## 1339 Europe 1982
## 1340 Europe 1987
## 1341 Europe 1992
## 1342 Europe 1997
## 1343 Europe 2002
## 1344 Europe 2007
## 1345 Africa 1952
## 1346 Africa 1957
## 1347 Africa 1962
## 1348 Africa 1967
## 1349 Africa 1972
## 1350 Africa 1977
## 1351 Africa 1982
## 1352 Africa 1987
## 1353 Africa 1992
## 1354 Africa 1997
## 1355 Africa 2002
## 1356 Africa 2007
## 1357 Asia 1952
## 1358 Asia 1957
## 1359 Asia 1962
## 1360 Asia 1967
## 1361 Asia 1972
## 1362 Asia 1977
## 1363 Asia 1982
## 1364 Asia 1987
## 1365 Asia 1992
## 1366 Asia 1997
## 1367 Asia 2002
## 1368 Asia 2007
## 1369 Europe 1952
## 1370 Europe 1957
## 1371 Europe 1962
## 1372 Europe 1967
## 1373 Europe 1972
## 1374 Europe 1977
## 1375 Europe 1982
## 1376 Europe 1987
## 1377 Europe 1992
## 1378 Europe 1997
## 1379 Europe 2002
## 1380 Europe 2007
## 1381 Europe 1952
## 1382 Europe 1957
## 1383 Europe 1962
## 1384 Europe 1967
## 1385 Europe 1972
## 1386 Europe 1977
## 1387 Europe 1982
## 1388 Europe 1987
## 1389 Europe 1992
## 1390 Europe 1997
## 1391 Europe 2002
## 1392 Europe 2007
## 1393 Africa 1952
## 1394 Africa 1957
## 1395 Africa 1962
## 1396 Africa 1967
## 1397 Africa 1972
## 1398 Africa 1977
## 1399 Africa 1982
## 1400 Africa 1987
## 1401 Africa 1992
## 1402 Africa 1997
## 1403 Africa 2002
## 1404 Africa 2007
## 1405 Africa 1952
## 1406 Africa 1957
## 1407 Africa 1962
## 1408 Africa 1967
## 1409 Africa 1972
## 1410 Africa 1977
## 1411 Africa 1982
## 1412 Africa 1987
## 1413 Africa 1992
## 1414 Africa 1997
## 1415 Africa 2002
## 1416 Africa 2007
## 1417 Europe 1952
## 1418 Europe 1957
## 1419 Europe 1962
## 1420 Europe 1967
## 1421 Europe 1972
## 1422 Europe 1977
## 1423 Europe 1982
## 1424 Europe 1987
## 1425 Europe 1992
## 1426 Europe 1997
## 1427 Europe 2002
## 1428 Europe 2007
## 1429 Asia 1952
## 1430 Asia 1957
## 1431 Asia 1962
## 1432 Asia 1967
## 1433 Asia 1972
## 1434 Asia 1977
## 1435 Asia 1982
## 1436 Asia 1987
## 1437 Asia 1992
## 1438 Asia 1997
## 1439 Asia 2002
## 1440 Asia 2007
## 1441 Africa 1952
## 1442 Africa 1957
## 1443 Africa 1962
## 1444 Africa 1967
## 1445 Africa 1972
## 1446 Africa 1977
## 1447 Africa 1982
## 1448 Africa 1987
## 1449 Africa 1992
## 1450 Africa 1997
## 1451 Africa 2002
## 1452 Africa 2007
## 1453 Africa 1952
## 1454 Africa 1957
## 1455 Africa 1962
## 1456 Africa 1967
## 1457 Africa 1972
## 1458 Africa 1977
## 1459 Africa 1982
## 1460 Africa 1987
## 1461 Africa 1992
## 1462 Africa 1997
## 1463 Africa 2002
## 1464 Africa 2007
## 1465 Europe 1952
## 1466 Europe 1957
## 1467 Europe 1962
## 1468 Europe 1967
## 1469 Europe 1972
## 1470 Europe 1977
## 1471 Europe 1982
## 1472 Europe 1987
## 1473 Europe 1992
## 1474 Europe 1997
## 1475 Europe 2002
## 1476 Europe 2007
## 1477 Europe 1952
## 1478 Europe 1957
## 1479 Europe 1962
## 1480 Europe 1967
## 1481 Europe 1972
## 1482 Europe 1977
## 1483 Europe 1982
## 1484 Europe 1987
## 1485 Europe 1992
## 1486 Europe 1997
## 1487 Europe 2002
## 1488 Europe 2007
## 1489 Asia 1952
## 1490 Asia 1957
## 1491 Asia 1962
## 1492 Asia 1967
## 1493 Asia 1972
## 1494 Asia 1977
## 1495 Asia 1982
## 1496 Asia 1987
## 1497 Asia 1992
## 1498 Asia 1997
## 1499 Asia 2002
## 1500 Asia 2007
## 1501 Asia 1952
## 1502 Asia 1957
## 1503 Asia 1962
## 1504 Asia 1967
## 1505 Asia 1972
## 1506 Asia 1977
## 1507 Asia 1982
## 1508 Asia 1987
## 1509 Asia 1992
## 1510 Asia 1997
## 1511 Asia 2002
## 1512 Asia 2007
## 1513 Africa 1952
## 1514 Africa 1957
## 1515 Africa 1962
## 1516 Africa 1967
## 1517 Africa 1972
## 1518 Africa 1977
## 1519 Africa 1982
## 1520 Africa 1987
## 1521 Africa 1992
## 1522 Africa 1997
## 1523 Africa 2002
## 1524 Africa 2007
## 1525 Asia 1952
## 1526 Asia 1957
## 1527 Asia 1962
## 1528 Asia 1967
## 1529 Asia 1972
## 1530 Asia 1977
## 1531 Asia 1982
## 1532 Asia 1987
## 1533 Asia 1992
## 1534 Asia 1997
## 1535 Asia 2002
## 1536 Asia 2007
## 1537 Africa 1952
## 1538 Africa 1957
## 1539 Africa 1962
## 1540 Africa 1967
## 1541 Africa 1972
## 1542 Africa 1977
## 1543 Africa 1982
## 1544 Africa 1987
## 1545 Africa 1992
## 1546 Africa 1997
## 1547 Africa 2002
## 1548 Africa 2007
## 1549 Americas 1952
## 1550 Americas 1957
## 1551 Americas 1962
## 1552 Americas 1967
## 1553 Americas 1972
## 1554 Americas 1977
## 1555 Americas 1982
## 1556 Americas 1987
## 1557 Americas 1992
## 1558 Americas 1997
## 1559 Americas 2002
## 1560 Americas 2007
## 1561 Africa 1952
## 1562 Africa 1957
## 1563 Africa 1962
## 1564 Africa 1967
## 1565 Africa 1972
## 1566 Africa 1977
## 1567 Africa 1982
## 1568 Africa 1987
## 1569 Africa 1992
## 1570 Africa 1997
## 1571 Africa 2002
## 1572 Africa 2007
## 1573 Europe 1952
## 1574 Europe 1957
## 1575 Europe 1962
## 1576 Europe 1967
## 1577 Europe 1972
## 1578 Europe 1977
## 1579 Europe 1982
## 1580 Europe 1987
## 1581 Europe 1992
## 1582 Europe 1997
## 1583 Europe 2002
## 1584 Europe 2007
## 1585 Africa 1952
## 1586 Africa 1957
## 1587 Africa 1962
## 1588 Africa 1967
## 1589 Africa 1972
## 1590 Africa 1977
## 1591 Africa 1982
## 1592 Africa 1987
## 1593 Africa 1992
## 1594 Africa 1997
## 1595 Africa 2002
## 1596 Africa 2007
## 1597 Europe 1952
## 1598 Europe 1957
## 1599 Europe 1962
## 1600 Europe 1967
## 1601 Europe 1972
## 1602 Europe 1977
## 1603 Europe 1982
## 1604 Europe 1987
## 1605 Europe 1992
## 1606 Europe 1997
## 1607 Europe 2002
## 1608 Europe 2007
## 1609 Americas 1952
## 1610 Americas 1957
## 1611 Americas 1962
## 1612 Americas 1967
## 1613 Americas 1972
## 1614 Americas 1977
## 1615 Americas 1982
## 1616 Americas 1987
## 1617 Americas 1992
## 1618 Americas 1997
## 1619 Americas 2002
## 1620 Americas 2007
## 1621 Americas 1952
## 1622 Americas 1957
## 1623 Americas 1962
## 1624 Americas 1967
## 1625 Americas 1972
## 1626 Americas 1977
## 1627 Americas 1982
## 1628 Americas 1987
## 1629 Americas 1992
## 1630 Americas 1997
## 1631 Americas 2002
## 1632 Americas 2007
## 1633 Americas 1952
## 1634 Americas 1957
## 1635 Americas 1962
## 1636 Americas 1967
## 1637 Americas 1972
## 1638 Americas 1977
## 1639 Americas 1982
## 1640 Americas 1987
## 1641 Americas 1992
## 1642 Americas 1997
## 1643 Americas 2002
## 1644 Americas 2007
## 1645 Asia 1952
## 1646 Asia 1957
## 1647 Asia 1962
## 1648 Asia 1967
## 1649 Asia 1972
## 1650 Asia 1977
## 1651 Asia 1982
## 1652 Asia 1987
## 1653 Asia 1992
## 1654 Asia 1997
## 1655 Asia 2002
## 1656 Asia 2007
## 1657 Asia 1952
## 1658 Asia 1957
## 1659 Asia 1962
## 1660 Asia 1967
## 1661 Asia 1972
## 1662 Asia 1977
## 1663 Asia 1982
## 1664 Asia 1987
## 1665 Asia 1992
## 1666 Asia 1997
## 1667 Asia 2002
## 1668 Asia 2007
## 1669 Asia 1952
## 1670 Asia 1957
## 1671 Asia 1962
## 1672 Asia 1967
## 1673 Asia 1972
## 1674 Asia 1977
## 1675 Asia 1982
## 1676 Asia 1987
## 1677 Asia 1992
## 1678 Asia 1997
## 1679 Asia 2002
## 1680 Asia 2007
## 1681 Africa 1952
## 1682 Africa 1957
## 1683 Africa 1962
## 1684 Africa 1967
## 1685 Africa 1972
## 1686 Africa 1977
## 1687 Africa 1982
## 1688 Africa 1987
## 1689 Africa 1992
## 1690 Africa 1997
## 1691 Africa 2002
## 1692 Africa 2007
## 1693 Africa 1952
## 1694 Africa 1957
## 1695 Africa 1962
## 1696 Africa 1967
## 1697 Africa 1972
## 1698 Africa 1977
## 1699 Africa 1982
## 1700 Africa 1987
## 1701 Africa 1992
## 1702 Africa 1997
## 1703 Africa 2002
## 1704 Africa 2007
gap[ , continent]## Error in `[.data.frame`(gap, , continent): object 'continent' not found
Because continent and year are not variables our current environment! dplyr does some fancy stuff behind the scenes to save us from typing the quotes.
This is fine if you have a data analysis workflow but if you want to write a function that, for example, selects an arbitrary set of columns, you'll run into trouble.
## here's a helper function that computes the mean of a variable, stratifying by a grouping variable
grouped_mean <- function(data, group_var, summary_var) {
data %>%
group_by(group_var) %>%
summarise(mean = mean(summary_var))
}
gap %>% grouped_mean(continent, lifeExp)## Error in grouped_df_impl(data, unname(vars), drop): Column `group_var` is unknown
gap %>% grouped_mean('continent', 'lifeExp')## Error in grouped_df_impl(data, unname(vars), drop): Column `group_var` is unknown
See the rlang or seplyr packages for how one can deal with this problem in this context of using functions.
Even before we conduct analysis or calculations, we need to put our data into the correct format. The goal here is to rearrange a messy dataset into one that is tidy
The two most important properties of tidy data are:
Tidy data is easier to work with, because you have a consistent way of referring to variables (as column names) and observations (as row indices). It then becomes easy to manipulate, visualize, and model.
For more on the concept of tidy data, read Hadley Wickham's paper here
"Tidy datasets are all alike but every messy dataset is messy in its own way." – Hadley Wickham
Tabular datasets can be arranged in many ways. For instance, consider the data below. Each data set displays information on heart rate observed in individuals across 3 different time periods. But the data are organized differently in each table.
wide <- data.frame(
name = c("Wilbur", "Petunia", "Gregory"),
time1 = c(67, 80, 64),
time2 = c(56, 90, 50),
time3 = c(70, 67, 101)
)
wide## name time1 time2 time3
## 1 Wilbur 67 56 70
## 2 Petunia 80 90 67
## 3 Gregory 64 50 101
long <- data.frame(
name = c("Wilbur", "Petunia", "Gregory", "Wilbur", "Petunia", "Gregory", "Wilbur", "Petunia", "Gregory"),
time = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
heartrate = c(67, 80, 64, 56, 90, 50, 70, 67, 10)
)
long## name time heartrate
## 1 Wilbur 1 67
## 2 Petunia 1 80
## 3 Gregory 1 64
## 4 Wilbur 2 56
## 5 Petunia 2 90
## 6 Gregory 2 50
## 7 Wilbur 3 70
## 8 Petunia 3 67
## 9 Gregory 3 10
Question: Which one of these do you think is the tidy format?
Answer: The first dataframe (the "wide" one) would not be considered tidy because values (i.e., heartrate) are spread across multiple columns.
We often refer to these different structurs as "long" vs. "wide" formats. In the "long" format, you usually have 1 column for the observed variable and the other columns are ID variables.
For the "wide" format each row is often a site/subject/patient and you have multiple observation variables containing the same type of data. These can be either repeated observations over time, or observation of multiple variables (or a mix of both). In the above case, we had the same kind of data (heart rate) entered across 3 different columns, corresponding to three different time periods.
You may find data input may be simpler and some programs/functions may prefer the "wide" format. However, many of R’s functions have been designed assuming you have "long" format data.
Lets look at the structure of our original gapminder dataframe:
head(gap)## country year pop continent lifeExp gdpPercap
## 1 Afghanistan 1952 8425333 Asia 28.801 779.4453
## 2 Afghanistan 1957 9240934 Asia 30.332 820.8530
## 3 Afghanistan 1962 10267083 Asia 31.997 853.1007
## 4 Afghanistan 1967 11537966 Asia 34.020 836.1971
## 5 Afghanistan 1972 13079460 Asia 36.088 739.9811
## 6 Afghanistan 1977 14880372 Asia 38.438 786.1134
Question: Is this data frame wide or long?
Answer: This data frame is somewhere in between the purely 'long' and 'wide' formats. We have 3 "ID variables" (continent, country, year) and 3 "Observation variables" (pop, lifeExp, gdpPercap).
Despite not having ALL observations in 1 column, this intermediate format makes sense given that all 3 observation variables have different units. As we have seen, many of the functions in R are often vector based, and you usually do not want to do mathematical operations on values with different units.
On the other hand, there are some instances in which a purely long or wide format is ideal (e.g. plotting). Likewise, sometimes you'll get data on your desk that is poorly organized, and you'll need to reshape it.
tidyrThankfully, the tidyr package will help you efficiently transform your data regardless of original format.
# Install the "tidyr" package (only necessary one time)
# install.packages("tidyr") # Not Run
# Load the "tidyr" package (necessary every new R session)
library(tidyr)tidyr::gatherUntil now, we've been using the nicely formatted original gapminder data set. This data set is not quite wide and not quite long -- it's something in the middle, but "real" data (i.e., our own research data) will never be so well organized. Here let's start with the wide format version of the gapminder data set.
gap_wide <- read.csv("../data/gapminder_wide.csv", stringsAsFactors = FALSE)
head(gap_wide)## continent country gdpPercap_1952 gdpPercap_1957 gdpPercap_1962
## 1 Africa Algeria 2449.0082 3013.9760 2550.8169
## 2 Africa Angola 3520.6103 3827.9405 4269.2767
## 3 Africa Benin 1062.7522 959.6011 949.4991
## 4 Africa Botswana 851.2411 918.2325 983.6540
## 5 Africa Burkina Faso 543.2552 617.1835 722.5120
## 6 Africa Burundi 339.2965 379.5646 355.2032
## gdpPercap_1967 gdpPercap_1972 gdpPercap_1977 gdpPercap_1982
## 1 3246.9918 4182.6638 4910.4168 5745.1602
## 2 5522.7764 5473.2880 3008.6474 2756.9537
## 3 1035.8314 1085.7969 1029.1613 1277.8976
## 4 1214.7093 2263.6111 3214.8578 4551.1421
## 5 794.8266 854.7360 743.3870 807.1986
## 6 412.9775 464.0995 556.1033 559.6032
## gdpPercap_1987 gdpPercap_1992 gdpPercap_1997 gdpPercap_2002
## 1 5681.3585 5023.2166 4797.2951 5288.0404
## 2 2430.2083 2627.8457 2277.1409 2773.2873
## 3 1225.8560 1191.2077 1232.9753 1372.8779
## 4 6205.8839 7954.1116 8647.1423 11003.6051
## 5 912.0631 931.7528 946.2950 1037.6452
## 6 621.8188 631.6999 463.1151 446.4035
## gdpPercap_2007 lifeExp_1952 lifeExp_1957 lifeExp_1962 lifeExp_1967
## 1 6223.3675 43.077 45.685 48.303 51.407
## 2 4797.2313 30.015 31.999 34.000 35.985
## 3 1441.2849 38.223 40.358 42.618 44.885
## 4 12569.8518 47.622 49.618 51.520 53.298
## 5 1217.0330 31.975 34.906 37.814 40.697
## 6 430.0707 39.031 40.533 42.045 43.548
## lifeExp_1972 lifeExp_1977 lifeExp_1982 lifeExp_1987 lifeExp_1992
## 1 54.518 58.014 61.368 65.799 67.744
## 2 37.928 39.483 39.942 39.906 40.647
## 3 47.014 49.190 50.904 52.337 53.919
## 4 56.024 59.319 61.484 63.622 62.745
## 5 43.591 46.137 48.122 49.557 50.260
## 6 44.057 45.910 47.471 48.211 44.736
## lifeExp_1997 lifeExp_2002 lifeExp_2007 pop_1952 pop_1957 pop_1962
## 1 69.152 70.994 72.301 9279525 10270856 11000948
## 2 40.963 41.003 42.731 4232095 4561361 4826015
## 3 54.777 54.406 56.728 1738315 1925173 2151895
## 4 52.556 46.634 50.728 442308 474639 512764
## 5 50.324 50.650 52.295 4469979 4713416 4919632
## 6 45.326 47.360 49.580 2445618 2667518 2961915
## pop_1967 pop_1972 pop_1977 pop_1982 pop_1987 pop_1992 pop_1997 pop_2002
## 1 12760499 14760787 17152804 20033753 23254956 26298373 29072015 31287142
## 2 5247469 5894858 6162675 7016384 7874230 8735988 9875024 10866106
## 3 2427334 2761407 3168267 3641603 4243788 4981671 6066080 7026113
## 4 553541 619351 781472 970347 1151184 1342614 1536536 1630347
## 5 5127935 5433886 5889574 6634596 7586551 8878303 10352843 12251209
## 6 3330989 3529983 3834415 4580410 5126023 5809236 6121610 7021078
## pop_2007
## 1 33333216
## 2 12420476
## 3 8078314
## 4 1639131
## 5 14326203
## 6 8390505
The first step towards getting our nice intermediate data format is to first convert from the wide to the long format. The function gather() will 'gather' the observation variables into a single variable. This is sometimes called "melting" your data, because it melts the table from wide to long. Those data will be melted into two variables: one for the variable names, and the other for the variable values.
gap_long <- gap_wide %>%
gather(obstype_year, obs_values, 3:38)
head(gap_long)## continent country obstype_year obs_values
## 1 Africa Algeria gdpPercap_1952 2449.0082
## 2 Africa Angola gdpPercap_1952 3520.6103
## 3 Africa Benin gdpPercap_1952 1062.7522
## 4 Africa Botswana gdpPercap_1952 851.2411
## 5 Africa Burkina Faso gdpPercap_1952 543.2552
## 6 Africa Burundi gdpPercap_1952 339.2965
Notice that we put 3 arguments into the gather() function:
obstype_year),obs_value),3:38, signalling columns 3 through 38) that we want to gather into one variable. Notice that we don't want to melt down columns 1 and 2, as these are considered "ID" variables.tidyr::selectIf there are a lot of columns or they're named in a consistent pattern, we might not want to select them using the column numbers. It'd be easier to use some information contained in the names themselves. We can select variables using:
x:z to select all variables between x and z-y to exclude ystarts_with(x, ignore.case = TRUE): all names that starts with xends_with(x, ignore.case = TRUE): all names that ends with xcontains(x, ignore.case = TRUE): all names that contain xSee the select() function in dplyr for more options.
For instance, here we do the same gather operation with (1) the starts_with function, and (2) the - operator:
# with the starts_with() function
gap_long <- gap_wide %>%
gather(obstype_year, obs_values, starts_with('pop'),
starts_with('lifeExp'), starts_with('gdpPercap'))
head(gap_long)## continent country obstype_year obs_values
## 1 Africa Algeria pop_1952 9279525
## 2 Africa Angola pop_1952 4232095
## 3 Africa Benin pop_1952 1738315
## 4 Africa Botswana pop_1952 442308
## 5 Africa Burkina Faso pop_1952 4469979
## 6 Africa Burundi pop_1952 2445618
# with the - operator
gap_long <- gap_wide %>%
gather(obstype_year, obs_values, -continent, -country)
head(gap_long)## continent country obstype_year obs_values
## 1 Africa Algeria gdpPercap_1952 2449.0082
## 2 Africa Angola gdpPercap_1952 3520.6103
## 3 Africa Benin gdpPercap_1952 1062.7522
## 4 Africa Botswana gdpPercap_1952 851.2411
## 5 Africa Burkina Faso gdpPercap_1952 543.2552
## 6 Africa Burundi gdpPercap_1952 339.2965
However you choose to do it, notice that the output collapses all of the measure variables into two columns: one containing new ID variable, the other containing the observation value for that row.
tidyr::separateYou'll notice that in our long dataset, obstype_year actually contains 2 pieces of information, the observation type (pop, lifeExp, or gdpPercap) and the year.
We can use the separate() function to split the character strings into multiple variables:
gap_long_sep <- gap_long %>%
separate(obstype_year, into = c('obs_type','year'), sep = "_") %>%
mutate(year = as.integer(year))
head(gap_long_sep)## continent country obs_type year obs_values
## 1 Africa Algeria gdpPercap 1952 2449.0082
## 2 Africa Angola gdpPercap 1952 3520.6103
## 3 Africa Benin gdpPercap 1952 1062.7522
## 4 Africa Botswana gdpPercap 1952 851.2411
## 5 Africa Burkina Faso gdpPercap 1952 543.2552
## 6 Africa Burundi gdpPercap 1952 339.2965
If you didn't use tidyr to do this, you'd have to use the strsplit function and use multiple lines of code to replace the column in gap_long with two new columns. This solution is much cleaner.
tidyr::spreadThe opposite of gather() is spread(). It spreads our observation variables back out to make a wider table. We can use this function to spread our gap_long() to the original "medium" format.
gap_medium <- gap_long_sep %>%
spread(obs_type, obs_values)
head(gap_medium)## continent country year gdpPercap lifeExp pop
## 1 Africa Algeria 1952 2449.008 43.077 9279525
## 2 Africa Algeria 1957 3013.976 45.685 10270856
## 3 Africa Algeria 1962 2550.817 48.303 11000948
## 4 Africa Algeria 1967 3246.992 51.407 12760499
## 5 Africa Algeria 1972 4182.664 54.518 14760787
## 6 Africa Algeria 1977 4910.417 58.014 17152804
All we need is some quick fixes to make this dataset identical to the original gap dataset:
gap <- read.csv("../data/gapminder-FiveYearData.csv")
head(gap_medium)## continent country year gdpPercap lifeExp pop
## 1 Africa Algeria 1952 2449.008 43.077 9279525
## 2 Africa Algeria 1957 3013.976 45.685 10270856
## 3 Africa Algeria 1962 2550.817 48.303 11000948
## 4 Africa Algeria 1967 3246.992 51.407 12760499
## 5 Africa Algeria 1972 4182.664 54.518 14760787
## 6 Africa Algeria 1977 4910.417 58.014 17152804
head(gap)## country year pop continent lifeExp gdpPercap
## 1 Afghanistan 1952 8425333 Asia 28.801 779.4453
## 2 Afghanistan 1957 9240934 Asia 30.332 820.8530
## 3 Afghanistan 1962 10267083 Asia 31.997 853.1007
## 4 Afghanistan 1967 11537966 Asia 34.020 836.1971
## 5 Afghanistan 1972 13079460 Asia 36.088 739.9811
## 6 Afghanistan 1977 14880372 Asia 38.438 786.1134
# rearrange columns
gap_medium <- gap_medium[,names(gap)]
head(gap_medium)## country year pop continent lifeExp gdpPercap
## 1 Algeria 1952 9279525 Africa 43.077 2449.008
## 2 Algeria 1957 10270856 Africa 45.685 3013.976
## 3 Algeria 1962 11000948 Africa 48.303 2550.817
## 4 Algeria 1967 12760499 Africa 51.407 3246.992
## 5 Algeria 1972 14760787 Africa 54.518 4182.664
## 6 Algeria 1977 17152804 Africa 58.014 4910.417
# arrange by country, continent, and year
gap_medium <- gap_medium %>%
arrange(country,continent,year)
head(gap_medium)## country year pop continent lifeExp gdpPercap
## 1 Afghanistan 1952 8425333 Asia 28.801 779.4453
## 2 Afghanistan 1957 9240934 Asia 30.332 820.8530
## 3 Afghanistan 1962 10267083 Asia 31.997 853.1007
## 4 Afghanistan 1967 11537966 Asia 34.020 836.1971
## 5 Afghanistan 1972 13079460 Asia 36.088 739.9811
## 6 Afghanistan 1977 14880372 Asia 38.438 786.1134
gather and spread are being replaced by pivot_longer and pivot_wider, which use ideas from the cdata package to make reshaping easier to think about.
dplyr and tidyr have many more functions to help you wrangle and manipulate your data. See the Data Wrangling Cheat Sheet for more.
There are some other useful packages in the tidyverse:
ggplot2 for plotting (I'll cover this in module 8)readr and haven for reading in data with structure other than csvstringr, lubridate, forcats for manipulating strings, dates, and factors, respectivelydplyrUse dplyr to create a data frame containing the median lifeExp for each continent
Use dplyr to add a column to the gapminder dataset that contains the total population of the continent of each observation in a given year. For example, if the first observation is Afghanistan in 1952, the new column would contain the population of Asia in 1952.
Use dplyr to add a column called gdpPercap_diff that contains the difference between the observation's gdpPercap and the mean gdpPercap of the continent in that year. Arrange the dataframe by the column you just created, in descending order (so that the relatively richest country/years are listed first)
tidyrcountry, year, and gdpPercap_diff columns. Use tidyr put it in wide format so that countries are rows and years are columns.Hint: you'll probably see a message about a missing grouping variable. If you don't want continent included, you can pass the output of problem 3 through ungroup() to get rid of the continent information.